empirical fisher
- Europe > Sweden > Stockholm > Stockholm (0.05)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (12 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
50d005f92a6c5c9646db4b761da676ba-Supplemental-Conference.pdf
Failure case 2: Augerino depends on the used parameterisation of invariance. The full GGN approximation in Eq. 5 is inO(NP2C) for computingN matrix-products. The diagonalGGNapproximation would be inO(NPC)and computation of the log-determinant onlyO(P). Computing the log-determinant can be done efficiently inO(D3 +G3)by decomposing the Kronecker factors (Immer et al., 2021a). The last two terms dependent onS come up due to the aggregation ofaugmentation samples inour approximation, that is,the expectations overaandg in the second line of Eq. 15.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
An Improved Empirical Fisher Approximation for Natural Gradient Descent
Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementation, the EF approximation has its theoretical and practical limitations. This paper investigates the issue of EF, which is shown to be a major cause of its poor empirical approximation quality. An improved empirical Fisher (iEF) method is proposed to address this issue, which is motivated as a generalised NGD method from a loss reduction perspective, meanwhile retaining the practical convenience of EF.
- Europe > Sweden > Stockholm > Stockholm (0.05)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (12 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Austria (0.04)
- (2 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Europe > Austria (0.04)
- (2 more...)
d1ff1ec86b62cd5f3903ff19c3a326b2-AuthorFeedback.pdf
We would like to thank the reviewers for their comments, and take the opportunity to answer their questions below. We thank the reviewer for the relevant [Amari et al., 2000] reference, which we will cite and discuss. Similarly, [Amari et al., 2000] considers single-layer networks Further, we examined the method's accuracy relative to recent techniques, and extended it to We are open to changing the term "WoodFisher" which we used as a mnemonic Please see Appendix S5 for ablation studies. For simplicity, we consider the scaling constant as 1 here. Thanks for the suggestions, we will correct the font sizes & the broken references.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
On the Computation of the Fisher Information in Continual Learning
Continual learning is a rapidly growing subfield of deep learning devoted to enabling neural networks to incrementally learn new tasks, domains or classes while not forgetting previously learned ones. Such continual learning is crucial for addressing real-world problems where data are constantly changing, such as in healthcare, autonomous driving or robotics. Unfortunately, continual learning is challenging for deep neural networks, mainly due to their tendency to forget previously acquired skills when learning something new. Elastic Weight Consolidation (EWC) [1], developed by Kirkpatrick and colleagues from DeepMind, is one of the most popular methods for continual learning with deep neural networks. To this day, this method is featured as a baseline in a large proportion of continual learning studies. However, in the original paper the exact implementation of EWC was not well described, and no official code was provided. A previous blog post by Huszár [2] already addressed an issue relating to how EWC should behave when there are more than two tasks.
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Asia > China (0.04)
- Health & Medicine (0.48)
- Education (0.47)
- Information Technology (0.34)